BioJulia系列-BED bioinformatics, biojulia, doc, julia BED.Record结构读写操作 原文戳我 Out of date 该工具包似乎有点过时, 但是因为该包内容和方法比较简单, 可能也还不会影响使用。 关于BED格式介绍, 戳这里 BED.jl提供了BED格式的I/O操作, 支持tabix索引。 BED.Record结构 julia # copied from https://github.com/BioJulia/BED.jl/blob/2e082b1a9f8c6c543e5544a90a7f970110ca7e6b/src/record.jl#L4 mutable struct Record # data and filled range data::Vector{UInt8} filled::UnitRange{Int} # number of columns ncols::Int # indexes chrom::UnitRange{Int} chromstart::UnitRange{Int} chromend::UnitRange{Int} name::UnitRange{Int} score::UnitRange{Int} strand::Int thickstart::UnitRange{Int} thickend::UnitRange{Int} itemrgb::UnitRange{Int} blockcount::UnitRange{Int} blocksizes::Vector{UnitRange{Int}} blockstarts::Vector{UnitRange{Int}} end julia 读写操作 最基础的读入操作: julia using BED # Input reader = open(BED.Reader, "file.bed") # iterate for rcd in reader # Do sth ... chrom = BED.chrom(rcd) # ... end close(reader) julia 这种操作在读每一行的时候都做一次allocate, 内存占用大, 可以采取就地更新记录的思路, 节省内存: julia reader = open(BED.Reader, "file.bed") record = BED.Record() while !eof(reader) empty!(record) read!(reader, record) # do sth ... end close(reader) julia 如果需要重复访问特定区间的记录, 构造一个IntervalCollection更有效(参考GenomicFeatures.jl): julia using BED using GenomicFeatures # Create an interval collection in memory. icol = open(BED.Reader, "data.bed") do reader IntervalCollection(reader) end # Query overlapping records. for interval in eachoverlap(icol, Interval("chrX", 40001, 51500)) # A record is stored in the metadata field of an interval. record = metadata(interval) # ... end julia